A Post by Michael B. Spring
Scholarship in a Digital World (December 1, 2007)
I just went back to read the report of a workshop on "Building
the Infrastructure for CyberScholarship". The workshop was funded by NSF and
sets out a roadmap for research for the next decade or so. The work is solid,
but it left me feeling like I sometimes do with my students. Good answer, but
the wrong question was asked. Let me be a little clearer about what I mean.
The findings of the workshop make a lot of sense, but in some ways they are too
driven by a shared blurred vision. Ok, still not clear. A number of workshop
participants are notable researchers doing great work in particular areas -- and
they have been for a decade or so. I get the sense that as the workshop went
on, some of the participants were trying to understand the visions of others so
as to prepare a plan for what needs to be done next. The problem is that they
were talking about different aspects of a big problem and trying to develop
solutions that solved all the problems. This is a situation in which I say to
my students, "don't just do something, stand there", which is my second most
favorite piece of advice. You guessed it, the first is "don't just stand there,
do something". The secret is knowing which to do first.
OK, let me try to say a little bit about what I am thinking. First of all,
we should be talking about scholarship, not cyberscholarship. I would hold that
while some aspects of computational scholarship change in a digital environment,
this is far form the top of the list of what people are talking about here. In
talking about scholarship, what are the new opportunities provided? My guess is
that there are about a dozen and that segmenting the problem into the component
pieces, we have a better chance of building solutions that make sense. Without
an effort to be comprehensive, here is my starting list, beginning with the low
hanging fruit:
- Large Data Sets
- Large Symbolic Data Sets: We are entering an era when scientists have enormous data
sets from multiple sources that we want to work with. I would place in this
category things like the human genome project. Also in this category would be
many of tge GIS data sets. These represent data sets of
unparalelled size that are manipulable by computer processing.
- Aggregated Large Data Sets: I am thinking here of sets that are not
necessarily collected as a large data set but that might be usefully processed
as aggregate data sets. While the end use demands of these sets is similar to
the first category, these represent a different kind of problem on the front end
which is deciding on common semantics for the data or providing for tanslations
in aggregation.
- Large Raster Data Sets: These may provide the largest sets of data, whether
they be as esoteric as Hubble space telescope images or as mundane as MRI's of
knees. There are significant problems related to image processing and
normalization of image data that need to be addressed here.
- Digital Collections of Results
- Live Reports: Readers of research frequently wonder how a given conclusion
in an article might chnage if data were manipulated differently. It is now
possible to have live data associated with a final report such that what if
questions could be asked
- Undiscovered Public Information: Much of the scholarly literature is
located in silo's such that a search in one space fails to find infomration
stored in another space. It should be possible to create massive cross domain
indexes that reduce the potential of public infomration being undiscovered
- Collaborative Documents: It is possible in digital environments to have
documents serve as the center of collaboration. As new document forms based on
XML emerge, it will be possible for researchers to truly collaborate on some
activity through the documents that support it.
- Aggregate Information
- Social Tagging: There are a number of recent developments that allow the
astute observer to gather information about artifacts based on distributed and
uncontrolled human observations. For example flickr allows us to harvest human
tags associated with images without any central control point.
- Social Rating: From Delicious to linked in, users are expressing their
opinions about and their ratings of various resources. These can tell us how
people rate materials, botth directly and indirectly. This information can be
mined and used in new ways.
- Social Systems: Social systems themselves provide an opportunity for study
of new kinds of relationships and new forms of social behavior.
- Your Thing -- what have I left out??
- ???:
- ???:
- ???: